Temperature Forecasting Using Long Short Term Memory (LSTM)

Gaurav Kesireddy, Fariha Moomtaheen, Nitul Singha

2023-08-03

Introduction

  • Long Short-Term Memory(LSTM) is a gated Recurrent Neural Network (RNN) designed to address issues related to long-term dependencies and the problems of gradient vanishing or exploding present in traditional RNNs by (Hochreiter and Schmidhuber 1997).

  • By incorporating forget gates, input gates, and output gates, LSTM can selectively retain crucial information from time series data based on its features while disregarding irrelevant information.

  • The central idea is a memory cell which can maintain its state over time, and non-linear gating units which regulate the information flow into and out of the cell (Greff et al. 2016).

Introduction Cont..

  • LSTM has been utilized in tasks like text categorization, sentence generation, and machine translation (Zhu et al. 2019).

  • Early on, linear models like Auto-Regressive (AR), Moving Average (MA), and Auto-Regressive Moving Average (ARMA) were proposed for time series forecasting [Wadhvani et al. (2017)](Mo and Tao 2016)(Ge and Kerrigan 2016).

  • The Auto-Regressive Integrated Moving Average (ARIMA) model was introduced (Ho and Xie 1998), which combines differencing operations to handle non-stationary data.

Methods

Figure 1 Architecture of a typical vanilla LSTM block (Van Houdt, Mosquera, and Nápoles 2020).

Methods Cont..

  • The LSTM architecture consists of a set of recurrently connected sub-networks, known as memory blocks.

  • The idea behind the memory block is to maintain its state over time and regulate the information flow through non-linear gating units.

  • The output of the block is recurrently connected back to the block input and all of the gates.

Methods Cont..

Lets assume a network comprising N processing blocks and M inputs. The forward pass is this recurrent neural system is described in 6 parts.

Block input. This step involves updating the block input component which combines the current input \(x^{(t)}\) and the output of that LSTM unit \(y^{(t-1)}\) in the last iteration. This can be done as shown below: \[z^{(t)} = g(W_zx^{(t)} + R_zy^{(t-1)} + b_z) - (1) \]

Methods Cont..

Input gate. During this step, we update the input gate that combines the current input \(x^{(t)}\), the output of that LSTM unit \(y^{(t-1)}\) and the cell value \(c^{(t-1)}\) in the last iteration. This can be done as shown below: \[i_{(t)} =\sigma(W_ix^{(t)} + R_iy^{(t-1)} + p_i.c^{(t-1)} + b_i ) -(2) \]

Methods Cont..

Forget gate. During this step, the LSTM unit determines which information should be removed from its previous cell states \(c^{(t-1)}\). Therefore, the activation values \(f^{(t)}\) of the forget gates at the time step t are calculated based on the current input \(x^{(t)}\), the outputs \(y^{(t-1)}\) and the state \(c^{(t-1)}\) of the memory cells at the previous time step (t-1), the peephole connections, and the bias terms \(b_f\) of the forget gates. This can be done as shown below: \[ f_{(t)} = \sigma(W_fx^{(t)} + R_fy^{(t-1)} + p_f.c^{(t-1)} + b_f ) -(3) \]

Methods Cont..

Cell. This step computes the cell values, which combines the block input \(z^{(t)}\), the input gate \(i^{(t)}\) and the forget gate \(f_{(t)}\) with the previous cell value. This can be done as shown below:

\[ c^{(t)} = z^{(t)}. i^{(t)} + c^{(t-1)}.f^{(t)}-(4) \]

Methods Cont..

Output gate. This step calculates the output gate, which combines the current input \(x^{(t)}\), the output of that LSTM unit \(y^{(t-1)}\) and the cell value \(c^{(t-1)}\) in the last iteration. This can be done as shown below:

\[ o^{(t)} = \sigma(W_ox^{(t)} + R_oy^{(t-1)} + p_o.c^{(t-1)} + b_o ) -(5) \]

Methods Cont..

Block output. Finally, we calculate the block output, which combines the current cell value \(c^{(t)}\) with the current output gate value as follows:

\[ y^{(t)} = g(c^{(t)}). o^{(t)}-(6) \] In the above steps, \(\sigma\), g and h denote point-wise non-linear activation functions.

  • The logistic sigmoid \(\sigma(x) = 1/(1+e^{1-x})\) is used as a gate activation function and the hyperbolic tangent \(g(x)= h(x)= tanh(x)\) is often used as the block input and output activation function(Van Houdt, Mosquera, and Nápoles 2020).

Dataset Description

  • The weather forecasting dataset for Indian climate in the city of Delhi, India, covers a period from 1st January 2013 to 24th April 2017.

  • The dataset includes four key parameters, each providing insights into the weather conditions during this time frame. These are mean temp, humidity, wind speed, mean pressure.

  • We have considered only ‘Mean temperature (meantemp)’ for this analysis. So we have removed all other attributes from the database.

Dataset Description Cont..

First six observations of the dataset are

      date  meantemp
1 1/1/2013 10.000000
2 1/2/2013  7.400000
3 1/3/2013  7.166667
4 1/4/2013  8.666667
5 1/5/2013  6.000000
6 1/6/2013  7.000000

About the Dataset: The dataset was collected from the Weather Underground API and prepared as a part of Assignment 4 of the Data Analytics Course in 2019 at PES University, Bangalore. It is important to note that the ownership and credit for this dataset belong to Weather Underground due to its data source.

Visualization

The time series plot of the dataset is given below with date on x-axis and meantemp on y-axis respectively.

Statistical Modeling

We have used Min-Max transformation for data preparation. Here, we have used one LSTM layer as a simple LSTM model and a Dense layer is used as the output layer. Then, compile the model using the loss function, optimizer and metrics. This package is based on Keras and TensorFlow modules.(Paul and Garai 2021)

Min-Max Transformation

We performed min-max transformation on our Mean Temperature to keep it in the range from (-1 to 1) as we are going to use tanh Gate for our model.

Coding and Tuning

Usage: ts.lstm(ts=df$transformed, tsLag=5, LSTMUnits=7, DropoutRate = 0.1, Epochs = 10, CompLoss = “mse”, CompMetrics = “mae”, ActivationFn = “tanh”, SplitRatio = 0.99, ValidationSplit = 0.2)

Lag=50

Model: "sequential"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 lstm (LSTM)                        (None, 1, 7)                    1624        
 dense (Dense)                      (None, 1, 1)                    8           
================================================================================
Total params: 1,632
Trainable params: 1,632
Non-trainable params: 0
________________________________________________________________________________

Lag=40

Model: "sequential_1"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 lstm_1 (LSTM)                      (None, 1, 7)                    1344        
 dense_1 (Dense)                    (None, 1, 1)                    8           
================================================================================
Total params: 1,352
Trainable params: 1,352
Non-trainable params: 0
________________________________________________________________________________

Lag=30

Model: "sequential_2"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 lstm_2 (LSTM)                      (None, 1, 7)                    1064        
 dense_2 (Dense)                    (None, 1, 1)                    8           
================================================================================
Total params: 1,072
Trainable params: 1,072
Non-trainable params: 0
________________________________________________________________________________

Lag=20

Model: "sequential_3"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 lstm_3 (LSTM)                      (None, 1, 7)                    784         
 dense_3 (Dense)                    (None, 1, 1)                    8           
================================================================================
Total params: 792
Trainable params: 792
Non-trainable params: 0
________________________________________________________________________________

Lag=10

Model: "sequential_4"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 lstm_4 (LSTM)                      (None, 1, 7)                    504         
 dense_4 (Dense)                    (None, 1, 1)                    8           
================================================================================
Total params: 512
Trainable params: 512
Non-trainable params: 0
________________________________________________________________________________

Lag=5

Model: "sequential_5"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 lstm_5 (LSTM)                      (None, 1, 7)                    364         
 dense_5 (Dense)                    (None, 1, 1)                    8           
================================================================================
Total params: 372
Trainable params: 372
Non-trainable params: 0
________________________________________________________________________________

Accuracy table for Lag=5

        RMSE   MAPE
Train 0.0787 0.1218
Test  0.1014 0.1255

Model Validation plot for Lag=5

Decision

As we can see from the graphs, for Lag 50,40,30,20,10 and 5 we are getting the best prediction for Lag=5. Our prediction is matching the trend with actual data. So we are finalizing Lag=5 for our model.

Conclusion

Our Time series prediction using LSTM has demonstrated promising results in forecasting accuracy. During the training phase, the root mean square error (RMSE) of 0.0800 and the mean absolute percentage error (MAPE) of 0.1258 indicate that the LSTM model was able to effectively capture the underlying patterns and trends in the training data.

Conclusion Cont..

  • Upon evaluating the model’s performance on the test dataset, we obtained an RMSE of 0.1039 and a MAPE of 0.1274.

  • These results signify that the LSTM model successfully generalized to unseen data, showcasing its ability to make accurate predictions beyond the training data.

  • Overall, the relatively low values of both RMSE and MAPE for both the training and test phases highlight the effectiveness of the LSTM model in handling time series data.

Acknowledgement

• This dataset ownership and collection credit goes to Weather Undergroud API.

• Special thanks to Dr. Achraf Cohen for all his guidance throughout the project.

References

Ge, Ming, and Eric C Kerrigan. 2016. “Short-Term Ocean Wave Forecasting Using an Autoregressive Moving Average Model.” In 2016 UKACC 11th International Conference on Control (CONTROL), 1–6. IEEE.
Greff, Klaus, Rupesh K Srivastava, Jan Koutnı́k, Bas R Steunebrink, and Jürgen Schmidhuber. 2016. “LSTM: A Search Space Odyssey.” IEEE Transactions on Neural Networks and Learning Systems 28 (10): 2222–32.
Ho, Siu Lau, and Min Xie. 1998. “The Use of ARIMA Models for Reliability Forecasting and Analysis.” Computers & Industrial Engineering 35 (1-2): 213–16.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80.
Mo, Zhou, and Han Tao. 2016. “A Model of Oil Price Forecasting Based on Autoregressive and Moving Average.” In 2016 International Conference on Robots & Intelligent System (ICRIS), 22–25. IEEE.
Paul, Ranjit Kumar, and Sandip Garai. 2021. “Performance Comparison of Wavelets-Based Machine Learning Technique for Forecasting Agricultural Commodity Prices.” Soft Computing 25 (20): 12857–73.
Van Houdt, Greg, Carlos Mosquera, and Gonzalo Nápoles. 2020. “A Review on the Long Short-Term Memory Model.” Artificial Intelligence Review 53: 5929–55.
Wadhvani, Rajesh et al. 2017. “Review on Various Models for Time Series Forecasting.” In 2017 International Conference on Inventive Computing and Informatics (ICICI), 405–10. IEEE.
Zhu, Guangxuan, Hongbo Zhao, Haoqiang Liu, and Hua Sun. 2019. “A Novel LSTM-GAN Algorithm for Time Series Anomaly Detection.” In 2019 Prognostics and System Health Management Conference (PHM-Qingdao), 1–6. IEEE.